Back-Propagation Without Weight Transport
نویسندگان
چکیده
In back-propagation (Rumelhart et al, 1985) connection weights are used to both compute node activations and error gradients for hidden units. Grossberg (1987) has argued that the dual use of the same synaptic connections (“weight transport”) constitutes a bidirectional flow of information through synapses, which is biologically implausable. In this paper we formally and empirically demonstrate the feasibility of an architecture equivalent to back-propagation, but without the assumption of weight transport. Through coordinated training with weight decay, a reciprocal layer of weights evolves into a copy of the forward connections and acts as the conduit for backward flowing corrective information. Examination of the networks trained with dual weights suggests that functional synchronization, and not weight synchronization, is crucial to the operation of back-propagation methods. Introduction Back-propagation (Rumelhart et al, 1985) is a popular gradient descent method for tuning the behavior of connectionist networks. When interpreted as neurons and synapses, the nodes and links of back-propagation appear biologically implausable: private synaptic weights must be communicated to another part of the system. Grossberg (1987, p. 50) argues that “such a physical transport of weights has no plausible physical interpretation” within the known biology of neural systems. By modulating error signal distribution to previous layers with backward weights and synchronizing forward and backward weights with weight decay, our learning system avoids the biological implausibility of weight transport. We have tested this system in a variety of training situations including training deep (number of hidden layers greater than one) networks to learn exclusive-or, the 8-3-8 decoder problem, the Nettalk dictionary training set, the two spiral problem, a recursive autoassociative memory (Pollack, 1990), and training a sequential cascaded network to learn the seven Tomita languages (Pollack, 1992). Finally, we examine several networks generated by the dual weight learning system and discover that the weights modulating the backward flow of error-correcting signals do not necessarily need to be equal with the forward weights. What is important is that these weights provide the same functionality of the forward weights for the activations which pass through the network, and not all possible activation patterns. Forward and Backward Weights Back-propagation requires a forward pass through the network to generate the network’s output response followed by a backward pass to calculate the weight gradient for error minimization. As illustrated in Figure 1, the weight values used during the forward pass must be communicated to the error distribution mechanism and such a mechanism is infeasible under a neural interpretation of weights as synapses. One possible solution to the weight transport problem provides both processes with copies of the weights and involves only the transmission of weight changes to them. Both (Parker, 1985) and (Zipser and Rumelhart, 1990) have considered networks which maintain two weight sets: a for the forward pass and a for the backward pass. These pairs of corresponding weights, however, were started and kept equivalent at all times, i. e. . Unfortunately, this merely pushes the weight transport problem backward in time to network genesis, from an ontogenetic correctness issue to a morphogenetic one! What developmental mechanism guarantees equality between two corresponding weights? In this paper, we first show that it is possible to relax the equality restriction and consider only the asymtopic case, i.e. for all , . We show that besides its other uses (Hinton & Sejnowski, 1986; Hinton, 1987; Krogh and Hertz, 1992; MacKay, 1992; Moody, 1992), weight decay can synchronize each weight pair, as outlined in Figure 2. Weight decay is defined simply as , where is the decay constant. In physical systems, such as the brain or VLSI chips, maintaining stable analog values requires elaborate mechanisms. By adopting weight decay, designers may bypass these complex feats of engineering and exploit the natural dynamics of their medium. The following argument demonstrates that the difference between corresponding weights in the new system will wi j , wi j , ′ wi j , wi j , ′ = i j , wi j , t ( ) wi j , ′ t ( ) − t ∞ → lim 0 = wi j , t 1 + ( ) γwi j , t ( ) ∆wi j , t ( ) + = γ To appear in The Proceedings of the IEEE World Congress on Computational Intelligence. June 26 to July 2, 1994.
منابع مشابه
Physical modelling of caving propagation process and damage profile ahead of the cave-back
The cavability assessment of rock mass cavability and indicating the damage profile ahead of a cave-back is of great importance in the evaluation of a caving mine operation, which can influence all aspects of the mine operation. Due to the lack of access to the caved zones, our current knowledge about the damage profile in caved zones is very limited. Among the different approaches available, p...
متن کاملBack Propagation is Sensitive to Initial Conditions
This paper explores the effect of initial weight selection on feed-forward networks learning simple functions with the back-propagation technique. We first demonstrate, through the use of Monte Carlo techniques, that the magnitude of the initial condition vector (in weight space) is a very significant parameter in convergence time variability. In order to further understand this result, additio...
متن کاملImproved Back Propagation Algorithm to Avoid Local Minima in Multiplicative Neuron Model
The back propagation algorithm calculates the weight changes of artificial neural networks, and a common approach is to use a training algorithm consisting of a learning rate and a momentum factor. The major drawbacks of above learning algorithm are the problems of local minima and slow convergence speeds. The addition of an extra term, called a proportional factor reduces the convergence of th...
متن کاملInitial Classification Through Back Propagation In a Neural Network Following Optimization Through GA to Evaluate the Fitness of an Algorithm
an Artificial Neural Network classifier is a nonparametric classifier. It does not need any priori knowledge regarding the statistical distribution of the class in a giver selected data Source. While, neural network can be trained to distinguish the criteria used to classify easily in a generalized manner that allows successful classification the newly arrived inputs not used during training. T...
متن کاملOn the use of back propagation and radial basis function neural networks in surface roughness prediction
Various artificial neural networks types are examined and compared for the prediction of surface roughness in manufacturing technology. The aim of the study is to evaluate different kinds of neural networks and observe their performance and applicability on the same problem. More specifically, feed-forward artificial neural networks are trained with three different back propagation algorithms, ...
متن کامل